Chapter 4

Chapter 4. Larger System Examples I

4.1 "Splits and Joins and Alien Invasions"

This chapter and the next continue our look at the system utilities domain in Python. They present a collection of larger Python scripts that do real systems work -- comparing and copying directory trees, splitting files, searching files and directories, testing other programs, configuring program shell environments, launching web browsers, and so on. To make this collection easier to absorb, it's been split into a two-chapter set. This chapter presents assorted Python system utility programs that illustrate typical tasks and techniques in this domain. The next chapter presents larger Python programs that focus on more advanced file and directory tree processing.

Although the main point of these two case-study chapters is to give you a feel for realistic scripts in action, the size of these examples also gives us an opportunity to see Python's support for development paradigms like OOP and reuse at work. It's really only in the context of nontrivial programs like the ones we'll meet here that such tools begin to bear tangible fruit. These chapters also emphasize the "why" of systems tools, not just the "how" -- along the way, I'll point out real-world needs met by the examples we'll study, to help you put the details in context.

One note up front: these chapters move quickly, and a few of their examples are largely just listed for independent study. Because all the scripts here are all heavily documented and use Python system tools described in the prior two chapters, I won't go through all code in detail. You should read the source code listings and experiment with these programs on your own computer, to get a better feel for how to combine system interfaces to accomplish realistic tasks. They are all available in source code form on the book's CD-ROM (view CD-ROM content online at http://examples.oreilly.com/python2), and most work on all major platforms.

I should also mention that these are programs I really use -- not examples written just for this book. In fact, they were coded over years and perform widely differing tasks, so there is no obvious common thread to connect the dots here. On the other hand, they help explain why system tools are useful in the first place, demonstrate larger development concepts that simpler examples cannot, and bear collective witness to the simplicity and portability of automating system tasks with Python. Once you've mastered the basics, you'll probably wish you had done so sooner.

4.2 Splitting and Joining Files

Like most kids, mine spend a lot of time on the Internet. As far as I can tell, it's the thing to do these days. Among this latest generation, computer geeks and gurus seem to be held with the same sort of esteem that rock stars once were by mine. When kids disappear into their rooms, chances are good that they are hacking on computers, not mastering guitar riffs. It's probably healthier than some of the diversions of my own misspent youth, but that's a topic for another kind of book.

But if you have teenage kids and computers, or know someone who does, you probably know that it's not a bad idea to keep tabs on what those kids do on the Web. Type your favorite four-letter word in almost any web search engine and you'll understand the concern -- it's much better stuff than I could get during my teenage career. To sidestep the issue, only a few of the machines in my house have Internet feeds.

Now, while they're on one of these machines, my kids download lots of games. To avoid infecting our Very Important Computers with viruses from public-domain games, though, my kids usually have to download games on a computer with an Internet feed, and transfer them to their own computers to install. The problem is that game files are not small; they are usually much too big to fit on a floppy (and burning a CD takes away valuable game playing time).

If all the machines in my house ran Linux, this would be a nonissue. There are standard command-line programs on Unix for chopping a file into pieces small enough to fit on a floppy (split), and others for putting the pieces back together to recreate the original file (cat). Because we have all sorts of different machines in the house, though, we needed a more portable solution.

4.2.1 Splitting Files Portably

Since all the computers in my house run Python, a simple portable Python script came to the rescue. The Python program in Example 4-1 distributes a single file's contents among a set of part files, and stores those part files in a directory.

Example 4-1. PP2E\System\Filetools\split.py

#!/usr/bin/python
#########################################################
# split a file into a set of portions; join.py puts them
# back together; this is a customizable version of the 
# standard unix split command-line utility; because it
# is written in Python, it also works on Windows and can
# be easily tweaked; because it exports a function, it 
# can also be imported and reused in other applications;
#########################################################

import sys, os
kilobytes = 1024
megabytes = kilobytes * 1000
chunksize = int(1.4 * megabytes)                   # default: roughly a floppy

def split(fromfile, todir, chunksize=chunksize): 
    if not os.path.exists(todir):                  # caller handles errors
        os.mkdir(todir)                            # make dir, read/write parts
    else:
        for fname in os.listdir(todir):            # delete any existing files
            os.remove(os.path.join(todir, fname)) 
    partnum = 0
    input = open(fromfile, 'rb')                   # use binary mode on Windows
    while 1:                                       # eof=empty string from read
        chunk = input.read(chunksize)              # get next part <= chunksize
        if not chunk: break
        partnum  = partnum+1
        filename = os.path.join(todir, ('part%04d' % partnum))
        fileobj  = open(filename, 'wb')
        fileobj.write(chunk)
        fileobj.close()                            # or simply open(  ).write(  )
    input.close(  )
    assert partnum <= 9999                         # join sort fails if 5 digits
    return partnum
            
if __name__ == '__main__':
    if len(sys.argv) == 2 and sys.argv[1] == '-help':
        print 'Use: split.py [file-to-split target-dir [chunksize]]'
    else:
        if len(sys.argv) < 3:
            interactive = 1
            fromfile = raw_input('File to be split? ')       # input if clicked 
            todir    = raw_input('Directory to store part files? ')
        else:
            interactive = 0
            fromfile, todir = sys.argv[1:3]                  # args in cmdline
            if len(sys.argv) == 4: chunksize = int(sys.argv[3])
        absfrom, absto = map(os.path.abspath, [fromfile, todir])
        print 'Splitting', absfrom, 'to', absto, 'by', chunksize

        try:
            parts = split(fromfile, todir, chunksize)
        except:
            print 'Error during split:'
            print sys.exc_type, sys.exc_value
        else:
            print 'Split finished:', parts, 'parts are in', absto
        if interactive: raw_input('Press Enter key') # pause if clicked

By default, this script splits the input file into chunks that are roughly the size of a floppy disk -- perfect for moving big files between electronically isolated machines. Most important, because this is all portable Python code, this script will run on just about any machine, even ones without a file splitter of their own. All it requires is an installed Python. Here it is at work splitting the Python 1.5.2 self-installer executable on Windows:

C:\temp>echo %X%               shorthand shell variable
C:\PP2ndEd\examples\PP2E

C:\temp>ls -l py152.exe 
-rwxrwxrwa   1 0        0        5028339 Apr 16  1999 py152.exe

C:\temp>python %X%\System\Filetools\split.py -help 
Use: split.py [file-to-split target-dir [chunksize]]

C:\temp>python %X%\System\Filetools\split.py py152.exe pysplit 
Splitting C:\temp\py152.exe to C:\temp\pysplit by 1433600
Split finished: 4 parts are in C:\temp\pysplit

C:\temp>ls -l pysplit 
total 9821
-rwxrwxrwa   1 0        0        1433600 Sep 12 06:03 part0001
-rwxrwxrwa   1 0        0        1433600 Sep 12 06:03 part0002
-rwxrwxrwa   1 0        0        1433600 Sep 12 06:03 part0003
-rwxrwxrwa   1 0        0         727539 Sep 12 06:03 part0004

Each of these four generated part files represent one binary chunk of file py152.exe, small enough to fit comfortably on a floppy disk. In fact, if you add the sizes of the generated part files given by the ls command, you'll come up with 5,028,339 bytes -- exactly the same as the original file's size. Before we see how to put these files back together again, let's explore a few of the splitter script's finer points.

4.2.1.1 Operation modes

This script is designed to input its parameters in either interactive or command-line modes; it checks the number of command-line arguments to know in which mode it is being used. In command-line mode, you list the file to be split and the output directory on the command line, and can optionally override the default part file size with a third command-line argument.

In interactive mode, the script asks for a filename and output directory at the console window with raw_input, and pauses for a keypress at the end before exiting. This mode is nice when the program file is started by clicking on its icon -- on Windows, parameters are typed into a pop-up DOS box that doesn't automatically disappear. The script also shows the absolute paths of its parameters (by running them through os.path.abspath) because they may not be obvious in interactive mode. We'll see examples of other split modes at work in a moment.

4.2.1.2 Binary file access

This code is careful to open both input and output files in binary mode (rb, wb), because it needs to portably handle things like executables and audio files, not just text. In Chapter 2, we learned that on Windows, text-mode files automatically map \r\n end-of-line sequences to \n on input, and map \n to \r\n on output. For true binary data, we really don't want any \r characters in the data to go away when read, and we don't want any superfluous \r characters to be added on output. Binary-mode files suppress this \r mapping when the script is run on Windows, and so avoid data corruption.

4.2.1.3 Manually closing files

This script also goes out of its way to manually close its files. For instance:

 fileobj  = open(partname, 'wb')
 fileobj.write(chunk)
 fileobj.close(  )

As we also saw in Chapter 2, these three lines can usually be replaced with this single line:

open(partname, 'wb').write(chunk)

This shorter form relies on the fact that the current Python implementation automatically closes files for you when file objects are reclaimed (i.e., when they are garbage collected, because there are no more references to the file object). In this line, the file object would be reclaimed immediately, because the open result is temporary in an expression, and is never referenced by a longer-lived name. The input file similarly is reclaimed when the split function exits.

As I was writing this chapter, though, there was some possibility that this automatic-close behavior may go away in the future.^[1] Moreover, the JPython Java-based Python implementation does not reclaim unreferenced objects as immediately as the standard Python. If you care about the Java port (or one possible future), your script may potentially create many files in a short amount of time, and your script may run on a machine that has a limit on the number of open files per program, then close manually. The close calls in this script have never been necessary for my purposes, but because the split function in this module is intended to be a general-purpose tool, it accommodates such worst-case scenarios.

4.2.2 Joining Files Portably

Back to moving big files around the house. After downloading a big game program file, my kids generally run the previous splitter script by clicking on its name in Windows Explorer and typing filenames. After a split, they simply copy each part file onto its own floppy, walk the floppies upstairs, and recreate the split output directory on their target computer by copying files off the floppies. Finally, the script in Example 4-2 is clicked or otherwise run to put the parts back together.

Example 4-2. PP2E\System\Filetools\join.py

#!/usr/bin/python
##########################################################
# join all part files in a dir created by split.py.  
# This is roughly like a 'cat fromdir/* > tofile' command
# on unix, but is a bit more portable and configurable,
# and exports the join operation as a reusable function.
# Relies on sort order of file names: must be same length.
# Could extend split/join to popup Tkinter file selectors.
##########################################################

import os, sys
readsize = 1024

def join(fromdir, tofile):
    output = open(tofile, 'wb')
    parts  = os.listdir(fromdir)
    parts.sort(  )
    for filename in parts:
        filepath = os.path.join(fromdir, filename)
        fileobj  = open(filepath, 'rb')
        while 1:
            filebytes = fileobj.read(readsize)
            if not filebytes: break
            output.write(filebytes)
        fileobj.close(  )
    output.close(  )

if __name__ == '__main__':
    if len(sys.argv) == 2 and sys.argv[1] == '-help':
        print 'Use: join.py [from-dir-name to-file-name]'
    else:
        if len(sys.argv) != 3:
            interactive = 1
            fromdir = raw_input('Directory containing part files? ')
            tofile  = raw_input('Name of file to be recreated? ')
        else:
            interactive = 0
            fromdir, tofile = sys.argv[1:]
        absfrom, absto = map(os.path.abspath, [fromdir, tofile])
        print 'Joining', absfrom, 'to make', absto

        try:
            join(fromdir, tofile)
        except:
            print 'Error joining files:'
            print sys.exc_type, sys.exc_value
        else:
           print 'Join complete: see', absto
        if interactive: raw_input('Press Enter key') # pause if clicked

After running the join script, they still may need to run something like zip, gzip, or tar to unpack an archive file, unless it's shipped as an executable;^[2] but at least they're much closer to seeing the Starship Enterprise spring into action. Here is a join in progress on Windows, combining the split files we made a moment ago:

C:\temp>python %X%\System\Filetools\join.py -help
Use: join.py [from-dir-name to-file-name]

C:\temp>python %X%\System\Filetools\join.py pysplit mypy152.exe
Joining C:\temp\pysplit to make C:\temp\mypy152.exe
Join complete: see C:\temp\mypy152.exe

C:\temp>ls -l mypy152.exe py152.exe
-rwxrwxrwa   1 0        0        5028339 Sep 12 06:05 mypy152.exe
-rwxrwxrwa   1 0        0        5028339 Apr 16  1999 py152.exe

C:\temp>fc /b mypy152.exe py152.exe
Comparing files mypy152.exe and py152.exe
FC: no differences encountered

The join script simply uses os.listdir to collect all the part files in a directory created by split, and sorts the filename list to put the parts back together in the correct order. We get back an exact byte-for-byte copy of the original file (proved by the DOS fc command above; use cmp on Unix).

Some of this process is still manual, of course (I haven't quite figured out how to script the "walk the floppies upstairs" bit yet), but the split and join scripts make it both quick and simple to move big files around. Because this script is also portable Python code, it runs on any platform we care to move split files to. For instance, it's typical for my kids to download both Windows and Linux games; since this script runs on either platform, they're covered.

4.2.2.1 Reading by blocks or files

Before we move on, there are a couple of details worth underscoring in the join script's code. First of all, notice that this script deals with files in binary mode, but also reads each part file in blocks of 1K bytes each. In fact, the readsize setting here (the size of each block read from an input part file) has no relation to chunksize in split.py (the total size of each output part file). As we learned in Chapter 2, this script could instead read each part file all at once:

filebytes = open(filepath, 'rb').read(  )
output.write(filebytes)

The downside to this scheme is that it really does load all of a file into memory at once. For example, reading a 1.4M part file into memory all at once with the file object read method generates a 1.4M string in memory to hold the file's bytes. Since split allows users to specify even larger chunk sizes, the join script plans for the worst and reads in terms of limited-size blocks. To be completely robust, the split script could read its input data in smaller chunks too, but this hasn't become a concern in practice.

4.2.2.2 Sorting filenames

If you study this script's code closely, you may also notice that the join scheme it uses relies completely on the sort order of filenames in the parts directory. Because it simply calls the list sort method on the filenames list returned by os.listdir, it implicitly requires that filenames have the same length and format when created by split. The splitter uses zero-padding notation in a string formatting expression ('part%04d') to make sure that filenames all have the same number of digits at the end (four), much like this list:

>>> list = ['xx008', 'xx010', 'xx006', 'xx009', 'xx011', 'xx111']
>>> list.sort(  )
>>> list
['xx006', 'xx008', 'xx009', 'xx010', 'xx011', 'xx111']

When sorted, the leading zero characters in small numbers guarantee that part files are ordered for joining correctly. Without the leading zeroes, join would fail whenever there were more than nine part files, because the first digit would dominate:

>>> list = ['xx8', 'xx10', 'xx6', 'xx9', 'xx11', 'xx111']
>>> list.sort(  )
>>> list
['xx10', 'xx11', 'xx111', 'xx6', 'xx8', 'xx9']

Because the list sort method accepts a comparison function as an argument, we could in principle strip off digits in filenames and sort numerically:

>>> list = ['xx8', 'xx10', 'xx6', 'xx9', 'xx11', 'xx111']
>>> list.sort(lambda x, y: cmp(int(x[2:]), int(y[2:])))
>>> list
['xx6', 'xx8', 'xx9', 'xx10', 'xx11', 'xx111']

But that still implies that filenames all must start with the same length substring, so this doesn't quite remove the file naming dependency between the split and join scripts. Because these scripts are designed to be two steps of the same process, though, some dependencies between them seem reasonable.

4.2.3 Usage Variations

Let's run a few more experiments with these Python system utilities to demonstrate other usage modes. When run without full command-line arguments, both split and join are smart enough to input their parameters interactively. Here they are chopping and gluing the Python self-installer file on Windows again, with parameters typed in the DOS console window:

C:\temp>python %X%\System\Filetools\split.py
File to be split? py152.exe
Directory to store part files? splitout
Splitting C:\temp\py152.exe to C:\temp\splitout by 1433600
Split finished: 4 parts are in C:\temp\splitout
Press Enter key

C:\temp>python %X%\System\Filetools\join.py
Directory containing part files? splitout
Name of file to be recreated? newpy152.exe
Joining C:\temp\splitout to make C:\temp\newpy152.exe
Join complete: see C:\temp\newpy152.exe
Press Enter key

C:\temp>fc /B py152.exe newpy152.exe
Comparing files py152.exe and newpy152.exe
FC: no differences encountered

When these program files are double-clicked in a file explorer GUI, they work the same way (there usually are no command-line arguments when launched this way). In this mode, absolute path displays help clarify where files are really at. Remember, the current working directory is the script's home directory when clicked like this, so the name tempsplit actually maps to a source code directory; type a full path to make the split files show up somewhere else:

 [in a popup DOS console box when split is clicked]
File to be split? c:\temp\py152.exe 
Directory to store part files? tempsplit 
Splitting c:\temp\py152.exe to C:\PP2ndEd\examples\PP2E\System\Filetools\
tempsplit by 1433600
Split finished: 4 parts are in C:\PP2ndEd\examples\PP2E\System\Filetools\
tempsplit
Press Enter key

 [in a popup DOS console box when join is clicked]
Directory containing part files? tempsplit 
Name of file to be recreated? c:\temp\morepy152.exe 
Joining C:\PP2ndEd\examples\PP2E\System\Filetools\tempsplit to make 
c:\temp\morepy152.exe
Join complete: see c:\temp\morepy152.exe
Press Enter key

Because these scripts package their core logic up in functions, though, it's just as easy to reuse their code by importing and calling from another Python component:

C:\temp>python
>>> from PP2E.System.Filetools.split import split
>>> from PP2E.System.Filetools.join  import join
>>>
>>> numparts = split('py152.exe', 'calldir')
>>> numparts
4
>>> join('calldir', 'callpy152.exe')
>>>
>>> import os
>>> os.system(r'fc /B py152.exe callpy152.exe')
Comparing files py152.exe and callpy152.exe
FC: no differences encountered
0

A word about performance: All the split and join tests shown so far process a 5M file, but take at most one second of real wall-clock time to finish on my Win- dows 98 300 and 650 MHz laptop computers -- plenty fast for just about any use I could imagine. (They run even faster after Windows has cached information about the files involved.) Both scripts run just as fast for other reasonable part file sizes too; here is the splitter chopping up the file into 500,000- and 50,000-byte parts:

C:\temp>python %X%\System\Filetools\split.py py152.exe tempsplit 500000
Splitting C:\temp\py152.exe to C:\temp\tempsplit by 500000
Split finished: 11 parts are in C:\temp\tempsplit

C:\temp>ls -l tempsplit
total 9826
-rwxrwxrwa   1 0        0         500000 Sep 12 06:29 part0001
-rwxrwxrwa   1 0        0         500000 Sep 12 06:29 part0002
-rwxrwxrwa   1 0        0         500000 Sep 12 06:29 part0003
-rwxrwxrwa   1 0        0         500000 Sep 12 06:29 part0004
-rwxrwxrwa   1 0        0         500000 Sep 12 06:29 part0005
-rwxrwxrwa   1 0        0         500000 Sep 12 06:29 part0006
-rwxrwxrwa   1 0        0         500000 Sep 12 06:29 part0007
-rwxrwxrwa   1 0        0         500000 Sep 12 06:29 part0008
-rwxrwxrwa   1 0        0         500000 Sep 12 06:29 part0009
-rwxrwxrwa   1 0        0         500000 Sep 12 06:29 part0010
-rwxrwxrwa   1 0        0          28339 Sep 12 06:29 part0011

C:\temp>python %X%\System\Filetools\split.py py152.exe tempsplit 50000
Splitting C:\temp\py152.exe to C:\temp\tempsplit by 50000
Split finished: 101 parts are in C:\temp\tempsplit

C:\temp>ls tempsplit
part0001  part0014  part0027  part0040  part0053  part0066  part0079  part0092
part0002  part0015  part0028  part0041  part0054  part0067  part0080  part0093
part0003  part0016  part0029  part0042  part0055  part0068  part0081  part0094
part0004  part0017  part0030  part0043  part0056  part0069  part0082  part0095
part0005  part0018  part0031  part0044  part0057  part0070  part0083  part0096
part0006  part0019  part0032  part0045  part0058  part0071  part0084  part0097
part0007  part0020  part0033  part0046  part0059  part0072  part0085  part0098
part0008  part0021  part0034  part0047  part0060  part0073  part0086  part0099
part0009  part0022  part0035  part0048  part0061  part0074  part0087  part0100
part0010  part0023  part0036  part0049  part0062  part0075  part0088  part0101
part0011  part0024  part0037  part0050  part0063  part0076  part0089
part0012  part0025  part0038  part0051  part0064  part0077  part0090
part0013  part0026  part0039  part0052  part0065  part0078  part0091

Split can take longer to finish, but only if the part file's size is set small enough to generate thousands of part files -- splitting into 1006 parts works, but runs slower (on my computer this split and join take about five and two seconds, respectively, depending on what other programs are open):

C:\temp>python %X%\System\Filetools\split.py py152.exe tempsplit 5000 
Splitting C:\temp\py152.exe to C:\temp\tempsplit by 5000
Split finished: 1006 parts are in C:\temp\tempsplit

C:\temp>python %X%\System\Filetools\join.py tempsplit mypy152.exe 
Joining C:\temp\tempsplit to make C:\temp\py152.exe
Join complete: see C:\temp\py152.exe

C:\temp>fc /B py152.exe mypy152.exe 
Comparing files py152.exe and mypy152.exe
FC: no differences encountered

C:\temp>ls -l tempsplit 
 ...1000 lines deleted...
-rwxrwxrwa   1 0        0           5000 Sep 12 06:30 part1001
-rwxrwxrwa   1 0        0           5000 Sep 12 06:30 part1002
-rwxrwxrwa   1 0        0           5000 Sep 12 06:30 part1003
-rwxrwxrwa   1 0        0           5000 Sep 12 06:30 part1004
-rwxrwxrwa   1 0        0           5000 Sep 12 06:30 part1005
-rwxrwxrwa   1 0        0           3339 Sep 12 06:30 part1006

Finally, the splitter is also smart enough to create the output directory if it doesn't yet exist, or clear out any old files there if it does exist. Because the joiner combines whatever files exist in the output directory, this is a nice ergonomic touch -- if the output directory was not cleared before each split, it would be too easy to forget that a prior run's files are still there. Given that my kids are running these scripts, they need to be as forgiving as possible; your user base may vary, but probably not by much.

C:\temp>python %X%\System\Filetools\split.py py152.exe tempsplit 700000 
Splitting C:\temp\py152.exe to C:\temp\tempsplit by 700000
Split finished: 8 parts are in C:\temp\tempsplit

C:\temp>ls -l tempsplit
total 9827
-rwxrwxrwa   1 0        0         700000 Sep 12 06:32 part0001
-rwxrwxrwa   1 0        0         700000 Sep 12 06:32 part0002
-rwxrwxrwa   1 0        0         700000 Sep 12 06:32 part0003
...
 ...only new files here...
...
-rwxrwxrwa   1 0        0         700000 Sep 12 06:32 part0006
-rwxrwxrwa   1 0        0         700000 Sep 12 06:32 part0007
-rwxrwxrwa   1 0        0         128339 Sep 12 06:32 part0008

4.3 Generating Forward-Link Web Pages

Moving is rarely painless, even in the brave new world of cyberspace. Changing your web site's Internet address can lead to all sorts of confusion -- you need to ask known contacts to use the new address, and hope that others will eventually stumble onto it themselves. But if you rely on the Internet, moves are bound to generate at least as much confusion as an address change in the real world.

Unfortunately, such site relocations are often unavoidable. Both ISPs (Internet Service Providers) and server machines come and go over the years. Moreover, some ISPs let their service fall to intolerable levels; if you are unlucky enough to have signed up with such an ISP, there is not much recourse but to change providers, and that often implies a change of web addresses.^[3]

Imagine, though, that you are an O'Reilly author, and have published your web site's address in multiple books sold widely all over the world. What to do, when your ISP's service level requires a site change? Notifying the tens or hundreds of thousands of readers out there isn't exactly a practical solution.

Probably the best you can do is to leave forwarding instructions at the old site, for some reasonably long period of time -- the virtual equivalent of a "We've Moved" sign in a storefront window. On the Web, such a sign can also send visitors to the new site automatically: simply leave a page at the old site containing a hyperlink to the page's address at the new site. With such forward-link files in place, visitors to the old addresses will be only one click away from reaching the new ones.

That sounds simple enough. But because visitors might try to directly access the address of any file at your old site, you generally need to leave one forward-link file for every old file -- HTML pages, images, and so on. If you happen to enjoy doing lots of mindless typing, you could create each forward-link file by hand. But given that my home site contains 140 files today, the prospect of running one editor session per file was more than enough motivation for an automated solution.

4.3.1 Page Template File

Here's what I came up with. First of all, I create a general page template text file, shown in Example 4-3, to describe how all the forward-link files should look, with parts to be filled in later.

Example 4-3. PP2E\System\Filetools\template.html

<HTML><BODY>
<H1>This page has moved</H1>

<P>This page now lives at this address:

<P><A HREF="http://$server$/$home$/$file$">
http://$server$/$home$/$file$</A>

<P>Please click on the new address to jump to this page, and
update any links accordingly.  
</P>

<HR>
<H3><A HREF="ispmove.html">Why the move? - The ISP story</A></H3>

</BODY></HTML>

To fully understand this template, you have to know something about HTML -- a web page description language that we'll explore in Chapter 12. But for the purposes of this example, you can ignore most of this file and focus on just the parts surrounded by dollar signs: the strings $server$ , $home$ , and $file$ are targets to be replaced with real values by global text substitutions. They represent items that vary per site relocation and file.

4.3.2 Page Generator Script

Now, given a page template file, the Python script in Example 4-4 generates all the required forward-link files automatically.

Example 4-4. PP2E\System\Filetools\site-forward.py

#######################################################
# Create forward link pages for relocating a web site.
# Generates one page for every existing site file;
# upload the generated files to your old web site.
# Performance note: the first 2 string.replace calls
# could be moved out of the for loop, but this runs 
# in < 1 second on my Win98 machine for 150 site files.
# Lib note: the os.listdir call can be replaced with:
# sitefiles = glob.glob(sitefilesdir + os.sep + '*') 
# but then the file/directory names must be split
# with: dirname, filename = os.path.split(sitefile); 
#######################################################

import os, string
servername   = 'starship.python.net'     # where site is relocating to
homedir      = '~lutz/home'              # where site will be rooted
sitefilesdir = 'public_html'             # where site files live locally
uploaddir    = 'isp-forward'             # where to store forward files
templatename = 'template.html'           # template for generated pages

try: 
    os.mkdir(uploaddir)                  # make upload dir if needed
except OSError: pass

template  = open(templatename).read(  )    # load or import template text 
sitefiles = os.listdir(sitefilesdir)     # filenames, no directory prefix

count = 0
for filename in sitefiles:
    fwdname = os.path.join(uploaddir, filename)        # or + os.sep + filename
    print 'creating', filename, 'as', fwdname

    filetext = string.replace(template, '$server$', servername)   # insert text 
    filetext = string.replace(filetext, '$home$',   homedir)      # and write
    filetext = string.replace(filetext, '$file$',   filename)     # file varies
    open(fwdname, 'w').write(filetext)
    count = count + 1

print 'Last file =>\n', filetext
print 'Done:', count, 'forward files created.'

Notice that the template's text is loaded by reading a file ; it would work just as well to code it as an imported Python string variable (e.g., a triple-quoted string in a module file). Also observe that all configuration options are assignments at the top of the script, not command-line arguments; since they change so seldom, it's convenient to type them just once in the script itself.

But the main thing worth noticing here is that this script doesn't care what the template file looks like at all; it simply performs global substitutions blindly in its text, with a different filename value for each generated file. In fact, we can change the template file any way we like, without having to touch the script. Such a division of labor can be used in all sorts of contexts -- generating "makefiles," form-letters, and so on. In terms of library tools, the generator script simply:

Uses os.listdir to step through all the filenames in the site's directory
Uses string.replace to perform global search-and-replace operations that fill in the $-delimited targets in the template file's text
Uses os.path.join and built-in file objects to write the resulting text out to a forward-link file of the same name, in an output directory

The end result is a mirror-image of the original web site directory, containing only forward-link files generated from the page template. As an added bonus, the generator script can be run on just about any Python platform -- I can run it on both my Windows laptop (where my web site files are maintained), as well as a Unix server where I keep a copy of my site. Here it is in action on Windows:

C:\Stuff\Website>python %X%\System\Filetools\site-forward.py 
creating about-hopl.html as isp-forward\about-hopl.html
creating about-lp-toc.html as isp-forward\about-lp-toc.html
creating about-lp.html as isp-forward\about-lp.html
creating about-pp-japan.html as isp-forward\about-pp-japan.html
...
 ...more lines deleted...
...
creating whatsold.html as isp-forward\whatsold.html
creating xlate-lp.html as isp-forward\xlate-lp.html
creating about-pp2e.html as isp-forward\about-pp2e.html
creating about-ppr2e.html as isp-forward\about-ppr2e.html
Last file =>
<HTML><BODY>
<H1>This page has moved</H1>

<P>This page now lives at this address:

<P><A HREF="http://starship.python.net/~lutz/home/about-ppr2e.html">
http://starship.python.net/~lutz/home/about-ppr2e.html</A>

<P>Please click on the new address to jump to this page, and
update any links accordingly.
</P>

<HR>
<H3><A HREF="ispmove.html">Why the move? - The ISP story</A></H3>

</BODY></HTML>

Done: 137 forward files created.

To verify this script's output, double-click on any of the output files to see what they look like in a web browser (or run a start command in a DOS console on Windows, e.g., start isp-forward\about-ppr2e.html). Figure 4-1 shows what one generated page looks like on my machine.

Figure 4-1. Site-forward output file page

figs/ppy2_0401.gif

To complete the process, you still need to install the forward links: upload all the generated files in the output directory to your old site's web directory. If that's too much to do by hand too, be sure to also see the FTP site upload scripts in Chapter 11, for an automatic way to do it with Python (PP2E\Internet\Ftp\uploadflat.py will do the job). Once you've caught the scripting bug, you'll be amazed at how much manual labor Python can automate.

4.4 A Regression Test Script

As we've seen, Python provides interfaces to a variety of system services, along with tools for adding others. Example 4-5 shows some commonly used services in action. It implements a simple regression-test system, by running a command-line program with a set of given input files and comparing the output of each run to the prior run's results. This script was adapted from an automated testing system I wrote to catch errors introduced by changes in program source files; in a big system, you might not know when a fix is really a bug in disguise.

Example 4-5. PP2E\System\Filetools\regtest.py

#!/usr/local/bin/python
import os, sys                            # get unix, python services 
from stat import ST_SIZE                  # or use os.path.getsize
from glob import glob                     # file name expansion
from os.path import exists                # file exists test
from time import time, ctime              # time functions

print 'RegTest start.' 
print 'user:', os.environ['USER']         # environment variables
print 'path:', os.getcwd(  )              # current directory
print 'time:', ctime(time(  )), '\n'
program = sys.argv[1]                     # two command-line args
testdir = sys.argv[2]

for test in glob(testdir + '/*.in'):      # for all matching input files
    if not exists('%s.out' % test):
        # no prior results
        os.system('%s < %s > %s.out 2>&1' % (program, test, test))
        print 'GENERATED:', test
    else: 
        # backup, run, compare
        os.rename(test + '.out', test + '.out.bkp')
        os.system('%s < %s > %s.out 2>&1' % (program, test, test))
        os.system('diff %s.out %s.out.bkp > %s.diffs' % ((test,)*3) )
        if os.stat(test + '.diffs')[ST_SIZE] == 0:
            print 'PASSED:', test 
            os.remove(test + '.diffs')
        else:
            print 'FAILED:', test, '(see %s.diffs)' % test

print 'RegTest done:', ctime(time(  ))

Some of this script is Unix-biased. For instance, the 2>&1 syntax to redirect stderr works on Unix and Windows NT/2000, but not on Windows 9x, and the diff command line spawned is a Unix utility. You'll need to tweak such code a bit to run this script on some platforms. Also, given the improvements to the os module's popen calls in Python 2.0, they have now become a more portable way to redirect streams in such a script, and an alternative to shell command redirection syntax.

But this script's basic operation is straightforward: for each filename with an .in suffix in the test directory, this script runs the program named on the command line and looks for deviations in its results. This is an easy way to spot changes (called "regressions") in the behavior of programs spawned from the shell. The real secret of this script's success is in the filenames used to record test information: within a given test directory testdir :

testdir/test.in files represent standard input sources for program runs.
testdir/test.in.out files represent the output generated for each input file.
testdir/test.in.out.bkp files are backups of prior .in.out result files.
testdir/test.in.diffs files represent regressions; output file differences.

Output and difference files are generated in the test directory, with distinct suffixes. For example, if we have an executable program or script called shrubbery, and a test directory called test1 containing a set of .in input files, a typical run of the tester might look something like this:

% regtest.py shrubbery test1
RegTest start.
user: mark
path: /home/mark/stuff/python/testing
time: Mon Feb 26 21:13:20 1996

FAILED: test1/t1.in (see test1/t1.in.diffs)
PASSED: test1/t2.in
FAILED: test1/t3.in (see test1/t3.in.diffs)
RegTest done: Mon Feb 26 21:13:27 1996

Here, shrubbery is run three times, for the three .in canned input files, and the results of each run are compared to output generated for these three inputs the last time testing was conducted. Such a Python script might be launched once a day, to automatically spot deviations caused by recent source code changes (e.g., from a cron job on Unix).

We've already met system interfaces used by this script; most are fairly standard Unix calls, and not very Python-specific to speak of. In fact, much of what happens when we run this script occurs in programs spawned by os.system calls. This script is really just a driver ; because it is completely independent of both the program to be tested and the inputs it will read, we can add new test cases on the fly by dropping a new input file in a test directory.

So given that this script just drives other programs with standard Unix-like calls, why use Python here instead of something like C ? First of all, the equivalent program in C would be much longer: it would need to declare variables, handle data structures, and more. In C, all external services exist in a single global scope (the linker's scope); in Python, they are partitioned into module namespaces (os, sys, etc.) to avoid name clashes. And unlike C, the Python code can be run immediately, without compiling and linking; changes can be tested much quicker in Python. Moreover, with just a little extra work we could make this script run on Windows 9x too. As you can probably tell by now, Python excels when it comes to portability and productivity.

Because of such benefits, automated testing is a very common role for Python scripts. If you are interested in using Python for testing, be sure to see Python's web site (http://www.python.org) for other available tools (e.g., the PyUnit system).

Testing Gone Bad?

Once we learn about sending email from Python scripts in Chapter 11, you might also want to augment this script to automatically send out email when regularly run tests fail. That way, you don't even need to remember to check results. Of course, you could go further still.

One company I worked at added sound effects to compiler test scripts; you got an audible round of applause if no regressions were found, and an entirely different noise otherwise. (See the end of this chapter and playfile.py in Chapter 11 for audio hints.)

Another company in my development past ran a nightly test script that automatically isolated the source code file check-in that triggered a test regression, and sent a nasty email to the guilty party (and their supervisor). Nobody expects the Spanish Inquisition!

4.5 Packing and Unpacking Files

Many moons ago (about five years), I used machines that had no tools for bundling files into a single package for easy transport. The situation is this: you have a large set of text files lying around that you need to transfer to another computer. These days, tools like tar are widely available for packaging many files into a single file that can be copied, uploaded, mailed, or otherwise transferred in a single step. Even Python itself has grown to support zip archives in the 2.0 standard library (see module zipfile).

Before I managed to install such tools on my PC, though, portable Python scripts served just as well. Example 4-6 copies all the files listed on the command line to the standard output stream, separated by marker lines.

Example 4-6. PP2E\System\App\Clients\textpack.py

#!/usr/local/bin/python
import sys                           # load the system module
marker = ':'*10 + 'textpak=>'     # hopefully unique separator

def pack(  ):
    for name in sys.argv[1:]:        # for all command-line arguments
        input = open(name, 'r')      # open the next input file
        print marker + name          # write a separator line
        print input.read(  ),        # and write the file's contents

if __name__ == '__main__': pack(  )  # pack files listed on cmdline

The first line in this file is a Python comment (#...), but it also gives the path to the Python interpreter using the Unix executable-script trick discussed in Chapter 2. If we give textpack.py executable permission with a Unix chmod command, we can pack files by running this program file directly from a Unix shell, and redirect its standard output stream to the file we want the packed archive to show up in. It works the same on Windows, but we just type the interpreter name "python" instead:

C:\...\PP2E\System\App\Clients\test>type spam.txt
SPAM
spam

C:\......\test>python ..\textpack.py spam.txt eggs.txt ham.txt > packed.all

C:\......\test>type packed.all
::::::::::textpak=>spam.txt
SPAM
spam
::::::::::textpak=>eggs.txt
EGGS
::::::::::textpak=>ham.txt
ham

Running the program this way creates a single output file called packed.all, which contains all three input files, with a header line giving the original file's name before each file's contents. Combining many files into one like this makes it easy to transfer in a single step -- only one file need be copied to floppy, emailed, and so on. If you have hundreds of files to move, this can be a big win.

After such a file is transferred, though, it must somehow be unpacked on the receiving end, to recreate the original files. To do so, we need to scan the combined file line by line, watching for header lines left by the packer to know when a new file's contents begins. Another simple Python script, shown in Example 4-7, does the trick.

Example 4-7. PP2E\System\App\Clients\textunpack.py

#!/usr/local/bin/python
import sys
from textpack import marker                    # use common seperator key
mlen = len(marker)                             # file names after markers

for line in sys.stdin.readlines(  ):           # for all input lines
    if line[:mlen] != marker:
        print line,                            # write real lines
    else:
        sys.stdout = open(line[mlen:-1], 'w')  # or make new output file

We could code this in a function like we did in textpack, but there is little point here -- as written, the script relies on standard streams, not function parameters. Run this in the directory where you want unpacked files to appear, with the packed archive file piped in on the command line as the script's standard input stream:

C:\......\test\unpack>python ..\..\textunpack.py < ..\packed.all

C:\......\test\unpack>ls
eggs.txt  ham.txt   spam.txt

C:\......\test\unpack>type spam.txt
SPAM
Spam

4.5.1 Packing Files "++"

So far so good; the textpack and textunpack scripts made it easy to move lots of files around, without lots of manual intervention. But after playing with these and similar scripts for a while, I began to see commonalities that almost cried out for reuse. For instance, almost every shell tool I wrote had to scan command-line arguments, redirect streams to a variety of sources, and so on. Further, almost every command-line utility wound up with a different command-line option pattern, because each was written from scratch.

The following few classes are one solution to such problems. They define a class hierarchy that is designed for reuse of common shell tool code. Moreover, because of the reuse going on, every program that ties into its hierarchy sports a common look-and-feel in terms of command-line options, environment variable use, and more. As usual with object-oriented systems, once you learn which methods to overload, such a class framework provides a lot of work and consistency for free. The module in Example 4-8 adapts the textpack script's logic for integration into this hierarchy.

Example 4-8. PP2E\System\App\Clients\packapp.py

#!/usr/local/bin/python
######################################################
# pack text files into one, separated by marker line;
# % packapp.py -v -o target src src...
# % packapp.py *.txt -o packed1
# >>> apptools.appRun('packapp.py', args...)
# >>> apptools.appCall(PackApp, args...)
######################################################

from textpack import marker
from PP2E.System.App.Kinds.redirect import StreamApp

class PackApp(StreamApp):
    def start(self):
        StreamApp.start(self)
        if not self.args:
            self.exit('packapp.py [-o target]? src src...')
    def run(self):
        for name in self.restargs(  ):
            try:
                self.message('packing: ' + name)
                self.pack_file(name)
            except:
                self.exit('error processing: ' + name)
    def pack_file(self, name):  
        self.setInput(name)             
        self.write(marker + name + '\n')
        while 1:
            line = self.readline(  )
            if not line: break
            self.write(line)

if __name__ == '__main__':  PackApp(  ).main(  )

Here, PackApp inherits members and methods that handle:

Operating system services
Command-line processing
Input/output stream redirection

from the StreamApp class, imported from another Python module file (listed in Example 4-10). StreamApp provides a "read/write" interface to redirected streams, and provides a standard "start/run/stop" script execution protocol. PackApp simply redefines the start and run methods for its own purposes, and reads and writes itself to access its standard streams. Most low-level system interfaces are hidden by the StreamApp class; in OOP terms, we say they are encapsulated.

This module can both be run as a program, and imported by a client (remember, Python sets a module's name to __main_ _ when it's run directly, so it can tell the difference). When run as a program, the last line creates an instance of the PackApp class, and starts it by calling its main method -- a method call exported by StreamApp to kick off a program run:

C:\......\test>python ..\packapp.py -v -o packedapp.all spam.txt eggs.txt ham.txt
PackApp start.
packing: spam.txt
packing: eggs.txt
packing: ham.txt
PackApp done.

C:\......\test>type packedapp.all
::::::::::textpak=>spam.txt
SPAM
spam
::::::::::textpak=>eggs.txt
EGGS
::::::::::textpak=>ham.txt
ham

This has the same effect as the textpack.py script, but command-line options (-v for verbose mode, -o to name an output file) are inherited from the StreamApp superclass. The unpacker in Example 4-9 looks similar when migrated to the OO framework, because the very notion of running a program has been given a standard structure.

Example 4-9. PP2E\System\App\Clients\unpackapp.py

#!/usr/bin/python
###########################################
# unpack a packapp.py output file;
# % unpackapp.py -i packed1 -v
# apptools.appRun('unpackapp.py', args...)
# apptools.appCall(UnpackApp, args...)
###########################################

import string
from textpack import marker
from PP2E.System.App.Kinds.redirect import StreamApp

class UnpackApp(StreamApp):
    def start(self):
        StreamApp.start(self)
        self.endargs(  )              # ignore more -o's, etc.
    def run(self):
        mlen = len(marker)
        while 1:
            line = self.readline(  )
            if not line: 
                break
            elif line[:mlen] != marker:
                self.write(line)
            else:
                name = string.strip(line[mlen:])
                self.message('creating: ' + name)
                self.setOutput(name)

if __name__ == '__main__':  UnpackApp(  ).main(  )

This subclass redefines start and run methods to do the right thing for this script -- prepare for and execute a file unpacking operation. All the details of parsing command-line arguments and redirecting standard streams are handled in superclasses:

C:\......\test\unpackapp>python ..\..\unpackapp.py -v -i ..\packedapp.all
UnpackApp start.
creating: spam.txt
creating: eggs.txt
creating: ham.txt
UnpackApp done.

C:\......\test\unpackapp>ls
eggs.txt  ham.txt   spam.txt

C:\......\test\unpackapp>type spam.txt
SPAM
spam

Running this script does the same job as the original textunpack.py, but we get command-line flags for free (-i specifies the input files). In fact, there are more ways to launch classes in this hierarchy than I have space to show here. A command line pair, -i -, for instance, makes the script read its input from stdin, as though it were simply piped or redirected in the shell:

C:\......\test\unpackapp>type ..\packedapp.all | python ..\..\unpackapp.py -i -
creating: spam.txt
creating: eggs.txt
creating: ham.txt

4.5.2 Application Hierarchy Superclasses

This section lists the source code of StreamApp and App -- the classes that do all this extra work on behalf of PackApp and UnpackApp. We don't have space to go through all this code in detail, so be sure to study these listings on your own for more information. It's all straight Python code.

I should also point out that the classes listed in this section are just the ones used by the object-oriented mutations of the textpack and textunpack scripts. They represent just one branch of an overall application framework class tree, that you can study on this book's CD (see http://examples.oreilly.com/python2 and browse directory PP2E\System\App). Other classes in the tree provide command menus, internal string-based file streams, and so on. You'll also find additional clients of the hierarchy that do things like launch other shell tools, and scan Unix-style email mailbox files.

4.5.2.1 StreamApp: Adding stream redirection

StreamApp adds a few command-line arguments (-i, -o) and input/output stream redirection to the more general App root class listed later; App in turn defines the most general kinds of program behavior, to be inherited in Examples Example 4-8, Example 4-9, and Example 4-10, i.e., in all classes derived from App.

Example 4-10. PP2E\System\App\Kinds\redirect.py

################################################################################
# App subclasses for redirecting standard streams to files
################################################################################

import sys
from PP2E.System.App.Bases.app import App

################################################################################
# an app with input/output stream redirection
################################################################################

class StreamApp(App):
    def __init__(self, ifile='-', ofile='-'):
        App.__init__(self)                              # call superclass init
        self.setInput( ifile or self.name + '.in')      # default i/o file names
        self.setOutput(ofile or self.name + '.out')     # unless '-i', '-o' args

    def closeApp(self):                                 # not __del__
        try:
            if self.input != sys.stdin:                 # may be redirected
                self.input.close(  )                      # if still open
        except: pass
        try:
            if self.output != sys.stdout:               # don't close stdout!
                self.output.close(  )                     # input/output exist?
        except: pass

    def help(self):
        App.help(self)
        print '-i <input-file |"-">  (default: stdin  or per app)'
        print '-o <output-file|"-">  (default: stdout or per app)'

    def setInput(self, default=None):
        file = self.getarg('-i') or default or '-'
        if file == '-':
            self.input = sys.stdin
            self.input_name = '<stdin>'
        else:
            self.input = open(file, 'r')            # cmdarg | funcarg | stdin
            self.input_name = file                  # cmdarg '-i -' works too

    def setOutput(self, default=None):
        file = self.getarg('-o') or default or '-'
        if file == '-':
            self.output = sys.stdout
            self.output_name = '<stdout>'
        else:
            self.output = open(file, 'w')           # error caught in main(  )
            self.output_name = file                 # make backups too?

class RedirectApp(StreamApp):
    def __init__(self, ifile=None, ofile=None):
        StreamApp.__init__(self, ifile, ofile)
        self.streams = sys.stdin, sys.stdout
        sys.stdin    = self.input                 # for raw_input, stdin
        sys.stdout   = self.output                # for print, stdout

    def closeApp(self):                           # not __del__
        StreamApp.closeApp(self)                  # close files?
        sys.stdin, sys.stdout = self.streams      # reset sys files


############################################################
# to add as a mix-in (or use multiple-inheritance...)
############################################################

class RedirectAnyApp:
    def __init__(self, superclass, *args):
        apply(superclass.__init__, (self,) + args)
        self.super   = superclass
        self.streams = sys.stdin, sys.stdout
        sys.stdin    = self.input                 # for raw_input, stdin
        sys.stdout   = self.output                # for print, stdout

    def closeApp(self):                         
        self.super.closeApp(self)                 # do the right thing
        sys.stdin, sys.stdout = self.streams      # reset sys files

4.5.2.2 App: The root class

The top of the hierarchy knows what it means to be a shell application, but not how to accomplish a particular utility task (those parts are filled in by subclasses). App, listed in Example 4-11, exports commonly used tools in a standard and simplified interface, and a customizable start/run/stop method protocol that abstracts script execution. It also turns application objects into file-like objects: when an application reads itself, for instance, it really reads whatever source its standard input stream has been assigned to by other superclasses in the tree (like StreamApp).

Example 4-11. PP2E\System\App\Bases\app.py

################################################################################
# an application class hierarchy, for handling top-level components;
# App is the root class of the App hierarchy, extended in other files;
################################################################################

import sys, os, traceback
AppError = 'App class error'                              # errors raised here

class App:                                                # the root class
    def __init__(self, name=None):
        self.name    = name or self.__class__.__name__    # the lowest class
        self.args    = sys.argv[1:] 
        self.env     = os.environ
        self.verbose = self.getopt('-v') or self.getenv('VERBOSE')   
        self.input   = sys.stdin
        self.output  = sys.stdout 
        self.error   = sys.stderr                     # stdout may be piped
    def closeApp(self):                               # not __del__: ref's?
        pass                                          # nothing at this level
    def help(self):
        print self.name, 'command-line arguments:'    # extend in subclass
        print '-v (verbose)'

    ##############################
    # script environment services
    ##############################

    def getopt(self, tag):
        try:                                    # test "-x" command arg
            self.args.remove(tag)               # not real argv: > 1 App?
            return 1                   
        except:
            return 0
    def getarg(self, tag, default=None):
        try:                                    # get "-x val" command arg
            pos = self.args.index(tag)
            val = self.args[pos+1]
            self.args[pos:pos+2] = []
            return val
        except:
            return default                      # None: missing, no default
    def getenv(self, name, default=''):
        try:                                    # get "$x" environment var
            return self.env[name]
        except KeyError:
            return default
    def endargs(self):
        if self.args:
            self.message('extra arguments ignored: ' + `self.args`)
            self.args = []
    def restargs(self):
        res, self.args = self.args, []          # no more args/options
        return res
    def message(self, text):
        self.error.write(text + '\n')           # stdout may be redirected
    def exception(self):
        return (sys.exc_type, sys.exc_value)    # the last exception
    def exit(self, message='', status=1):
        if message: 
            self.message(message)
        sys.exit(status)
    def shell(self, command, fork=0, inp=''):
        if self.verbose:
            self.message(command)                         # how about ipc?
        if not fork:
            os.system(command)                            # run a shell cmd
        elif fork == 1:
            return os.popen(command, 'r').read(  )          # get its output
        else:                                             # readlines too?
            pipe = os.popen(command, 'w')      
            pipe.write(inp)                               # send it input
            pipe.close(  )

    #################################################
    # input/output-stream methods for the app itself; 
    # redefine in subclasses if not using files, or 
    # set self.input/output to file-like objects;
    #################################################

    def read(self, *size):       
        return apply(self.input.read, size)
    def readline(self):          
        return self.input.readline(  )
    def readlines(self):         
        return self.input.readlines(  )
    def write(self, text):       
        self.output.write(text)
    def writelines(self, text):  
        self.output.writelines(text)

    ###################################################
    # to run the app
    # main(  ) is the start/run/stop execution protocol;
    ###################################################

    def main(self):
        res = None
        try:
            self.start(  )
            self.run(  )
            res = self.stop(  )               # optional return val
        except SystemExit:                  # ignore if from exit(  )
            pass
        except:
            self.message('uncaught: ' + `self.exception(  )`)
            traceback.print_exc(  )
        self.closeApp(  )
        return res

    def start(self): 
        if self.verbose: self.message(self.name + ' start.')
    def stop(self): 
        if self.verbose: self.message(self.name + ' done.')
    def run(self):  
        raise AppError, 'run must be redefined!'

4.5.2.3 Why use classes here?

Now that I've listed all this code, some readers might naturally want to ask, "So why go to all this trouble?" Given the amount of extra code in the OO version of these scripts, it's a perfectly valid question. Most of the code listed in Example 4-11 is general-purpose logic, designed to be used by many applications. Still, that doesn't explain why the packapp and unpackapp OO scripts are larger than the original equivalent textpack and textunpack non-OO scripts.

The answers will become more apparent after the first few times you don't have to write code to achieve a goal, but there are some concrete benefits worth summarizing here:

Encapsulation: StreamApp clients need not remember all the system interfaces in Python, because StreamApp exports its own unified view. For instance, arguments, streams, and shell variables are split across Python modules (e.g., sys.argv, sys.stdout, os.environ); in these classes, they are all collected in the same single place.
Standardization: From the shell user's perspective, StreamApp clients all have a common look-and-feel, because they inherit the same interfaces to the outside world from their superclasses (e.g., -i and -v flags).
Maintenance: All the common code in the App and StreamApp superclasses must be debugged only once. Moreover, localizing code in superclasses makes it easier to understand and change in the future.
Reuse: Such a framework can provide an extra precoded utility we would otherwise have to recode in every script we write (command-line argument extraction, for instance). That holds true both now and in the future -- services added to the App root class become immediately usable and customizable among all applications derived from this hierarchy.
Utility: Because file access isn't hardcoded in PackApp and UnpackApp, they can easily take on new behavior, just by changing the class they inherit from. Given the right superclass, PackApp and UnpackApp could just as easily read and write to strings or sockets, as to text files and standard streams.

Although it's not obvious until you start writing larger class-based systems, code reuse is perhaps the biggest win for class-based programs. For instance, in Chapter 9, we will reuse the OO-based packer and unpacker scripts by invoking them from a menu GUI like this:

from PP2E.System.App.Clients.packapp import PackApp
...get dialog inputs, glob filename patterns
app = PackApp(ofile=output)            # run with redirected output
app.args = filenames                   # reset cmdline args list
app.main(  )


from PP2E.System.App.Clients.unpackapp import UnpackApp
...get dialog input
app = UnpackApp(ifile=input)           # run with input from file
app.main(  )                             # execute app class

Because these classes encapsulate the notion of streams, they can be imported and called, not just run as top-level scripts. Further, their code is reusable two ways: not only do they export common system interfaces for reuse in subclasses, but they can also be used as software components as in the previous code listing. See the PP2E\Gui\Shellgui directory for the full source code of these clients.

Python doesn't impose OO programming, of course, and you can get a lot of work done with simpler functions and scripts. But once you learn how to structure class trees for reuse, going the extra OO mile usually pays off in the long run.

4.6 User-Friendly Program Launchers

Suppose, for just a moment, that you wish to ship Python programs to an audience that may be in the very early stages of evolving from computer user to computer programmer. Maybe you are shipping a Python application to nontechnical users; or perhaps you're interested in shipping a set of cool Python demo programs on a Python book's CD-ROM (see http://examples.oreilly.com/python2). Whatever the reason, some of the people who will use your software can't be expected to do any more than click a mouse -- much less edit their system configuration files to set things like PATH and PYTHONPATH per your programs' assumptions. Your software will have to configure itself.

Luckily, Python scripts can do that too. In the next two sections, we're going to see two modules that aim to automatically launch programs with minimal assumptions about the environment on the host machine:

Launcher.py is a library of tools for automatically configuring the shell environment in preparation for launching a Python script. It can be used to set required shell variables -- both the PATH system program search path (used to find the "python" executable), and the PYTHONPATH module search path (used to resolve imports within scripts). Because such variable settings made in a parent program are inherited by spawned child programs, this interface lets scripts preconfigure search paths for other scripts.
LaunchBrowser.py aims to portably locate and start an Internet browser program on the host machine to view a local file or remote web page. It uses tools in Launcher.py to search for a reasonable browser to run.

Both of these modules are designed to be reusable in any context where you want your software to be user-friendly. By searching for files and configuring environments automatically, your users can avoid (or at least postpone) having to learn the intricacies of environment configuration.

4.6.1 Launcher Module Clients

The two modules in this section see action in many of this book's examples. In fact, we've already used some of these tools. The launchmodes script we met at the end of the prior chapter imported Launcher functions to hunt for the local python.exe interpreter's path, needed by os.spawnv calls. That script could have assumed that everyone who installs it on their machine will edit its source code to add their own Python location; but the technical know-how required for even that task is already light-years beyond many potential users.^[4] It's much nicer to invest a negligible amount of startup time to locate Python automatically.

The two modules listed in Examples Example 4-14 and Example 4-15, together with launchmodes, also form the core of the demo-launcher programs at the top of the examples distribution on this book's CD (see http://examples.oreilly.com/python2). There's nothing quite like being able to witness programs in action first-hand, so I wanted to make it as easy as possible to launch Python examples in the book. Ideally, they should run straight off the CD when clicked, and not require readers to wade through a complex environment installation procedure.

However, many demos perform cross-directory imports, and so require the book's module package directories to be installed in PYTHONPATH; it is not enough just to click on some programs' icons at random. Moreover, when first starting out, users can't be assumed to have added the Python executable to their system search path either; the name "python" might not mean anything in the shell.

At least on platforms tested thus far, the following modules solve such configuration problems. For example, script Launch_PyDemos.pyw in the root directory automatically configures the system and Python execution environments using Launcher.py tools, and then spawns PyDemos.py, a Tkinter GUI Demo interface we'll meet later in this book. PyDemos in turn uses launchmodes to spawn other programs, that also inherit the environment settings made at the top. The net effect is that clicking any of the Launch_* scripts starts Python programs even if you haven't touched your environment settings at all.

You still need to install Python if it's not present, of course, but the Python Windows self-installer is a simple point-and-click affair too. Because searches and configuration take extra time, it's still to your advantage to eventually configure your environment settings and run programs like PyDemos directly, instead of through the launcher scripts. But there's much to be said for instant gratification when it comes to software.

These tools will show up in other contexts later in this text, too. For instance, the PyMail email interface we'll meet in Chapter 11 uses Launcher to locate its own source code file; since it's impossible to know what directory it will be run from, the best it can do is search. Another GUI example, big_gui, will use a similar Launcher tool to locate canned Python source-distribution demo programs in arbitrary and unpredictable places on the underlying computer.

The LaunchBrowser script in Example 4-15 also uses Launcher to locate suitable web browsers, and is itself used to start Internet demos in the PyDemos and PyGadgets launcher GUIs -- that is, Launcher starts PyDemos, which starts LaunchBrowser, which uses Launcher. By optimizing generality, these modules also optimize reusability.

4.6.2 Launching Programs Without Environment Settings

Because the Launcher.py file is heavily documented, I won't go over its fine points in narrative here. Instead, I'll just point out that all of its functions are useful by themselves, but the main entry point is the launchBookExamples function near the end; you need to work your way from the bottom of this file up to glimpse its larger picture.

The launchBookExamples function uses all the others, to configure the environment and then spawn one or more programs to run in that environment. In fact, the top-level demo launcher scripts shown in Examples Example 4-12 and Example 4-13 do nothing more than ask this function to spawn GUI demo interface programs we'll meet later (e.g., PyDemos.pyw, PyGadgets_bar.pyw). Because the GUIs are spawned indirectly through this interface, all programs they spawn inherit the environment configurations too.

Example 4-12. PP2E\Launch_PyDemos.pyw

#!/bin/env python
###############################################
# PyDemos + environment search/config first
# run this if you haven't setup your paths yet
# you still must install Python first, though
###############################################

import Launcher
Launcher.launchBookExamples(['PyDemos.pyw'], 0)

Example 4-13. PP2E\Launch_PyGadgets_bar.pyw

#!/bin/env python
##################################################
# PyGadgets_bar + environment search/config first
# run this if you haven't setup your paths yet
# you still must install Python first, though
##################################################

import Launcher
Launcher.launchBookExamples(['PyGadgets_bar.pyw'], 0)

When run directly, PyDemos.pyw and PyGadgets_bar.pyw instead rely on the configuration settings on the underlying machine. In other words, Launcher effectively hides configuration details from the GUI interfaces, by enclosing them in a configuration program layer. To understand how, study Example 4-14.

Example 4-14. PP2E\Launcher.py

#!/usr/bin/env python
"""
----------------------------------------------------------------------------
Tools to find files, and run Python demos even if your environment has
not been manually configured yet.  For instance, provided you have already
installed Python, you can launch Tk demos directly off the book's CD by 
double-clicking this file's icon, without first changing your environment
config files.  Assumes Python has been installed first (double-click on the
python self-install exe on the CD), and tries to guess where Python and the 
examples distribution live on your machine.  Sets Python module and system
search paths before running scripts: this only works because env settings 
are inherited by spawned programs on both windows and linux.  You may want
to tweak the list of directories searched for speed, and probably want to 
run one of the Config/setup-pp files at startup time to avoid this search.
This script is friendly to already-configured path settings, and serves to 
demo platform-independent directory path processing.  Python programs can 
always be started under the Windows port by clicking (or spawning a 'start'
DOS command), but many book examples require the module search path too.
----------------------------------------------------------------------------
"""

import sys, os, string


def which(program, trace=1):
    """
    Look for program in all dirs in the system's search 
    path var, PATH; return full path to program if found, 
    else None. Doesn't handle aliases on Unix (where we 
    could also just run a 'which' shell cmd with os.popen),
    and it might help to also check if the file is really 
    an executable with os.stat and the stat module, using
    code like this: os.stat(filename)[stat.ST_MODE] & 0111
    """
    try:
        ospath = os.environ['PATH']
    except:
        ospath = '' # okay if not set
    systempath = string.split(ospath, os.pathsep)
    if trace: print 'Looking for', program, 'on', systempath
    for sysdir in systempath:
        filename = os.path.join(sysdir, program)      # adds os.sep between
        if os.path.isfile(filename):                  # exists and is a file?
            if trace: print 'Found', filename
            return filename
        else:
            if trace: print 'Not at', filename
    if trace: print program, 'not on system path'
    return None


def findFirst(thisDir, targetFile, trace=0):    
    """
    Search directories at and below thisDir for a file
    or dir named targetFile.  Like find.find in standard
    lib, but no name patterns, follows unix links, and
    stops at the first file found with a matching name.
    targetFile must be a simple base name, not dir path.
    """
    if trace: print 'Scanning', thisDir
    for filename in os.listdir(thisDir):                    # skip . and ..
        if filename in [os.curdir, os.pardir]:              # just in case
            continue
        elif filename == targetFile:                        # check name match
            return os.path.join(thisDir, targetFile)        # stop at this one
        else: 
            pathname = os.path.join(thisDir, filename)      # recur in subdirs
            if os.path.isdir(pathname):                     # stop at 1st match
                below = findFirst(pathname, targetFile, trace)  
                if below: return below

       
def guessLocation(file, isOnWindows=(sys.platform[:3]=='win'), trace=1):
    """
    Try to find directory where file is installed
    by looking in standard places for the platform.
    Change tries lists as needed for your machine.
    """
    cwd = os.getcwd(  )                             # directory where py started
    tryhere = cwd + os.sep + file                 # or os.path.join(cwd, file)
    if os.path.exists(tryhere):                   # don't search if it is here
        return tryhere                            # findFirst(cwd,file) descends
    if isOnWindows:
        tries = []
        for pydir in [r'C:\Python20', r'C:\Program Files\Python']:
            if os.path.exists(pydir):
                tries.append(pydir)
        tries = tries + [cwd, r'C:\Program Files']
        for drive in 'CGDEF':
            tries.append(drive + ':\\')
    else:
        tries = [cwd, os.environ['HOME'], '/usr/bin', '/usr/local/bin']
    for dir in tries:
        if trace: print 'Searching for %s in %s' % (file, dir)
        try:
            match = findFirst(dir, file)
        except OSError: 
            if trace: print 'Error while searching', dir     # skip bad drives
        else:
            if match: return match
    if trace: print file, 'not found! - configure your environment manually'
    return None


PP2EpackageRoots = [                               # python module search path
   #'%sPP2E' % os.sep,                             # pass in your own elsewhere
    '']                                            # '' adds examplesDir root


def configPythonPath(examplesDir, packageRoots=PP2EpackageRoots, trace=1):
    """
    Setup the Python module import search-path directory 
    list as necessary to run programs in the book examples 
    distribution, in case it hasn't been configured already.
    Add examples package root, plus nested package roots.
    This corresponds to the setup-pp* config file settings.
    os.environ assignments call os.putenv internally in 1.5,
    so these settings will be inherited by spawned programs.
    Python source lib dir and '.' are automatically searched;
    unix|win os.sep is '/' | '\\', os.pathsep is ':' | ';'.
    sys.path is for this process only--must set os.environ.
    adds new dirs to front, in case there are two installs.
    could also try to run platform's setup-pp* file in this
    process, but that's non-portable, slow, and error-prone.
    """
    try:
        ospythonpath = os.environ['PYTHONPATH']
    except:
        ospythonpath = '' # okay if not set 
    if trace: print 'PYTHONPATH start:\n', ospythonpath
    addList = []
    for root in packageRoots:
        importDir = examplesDir + root
        if importDir in sys.path:
            if trace: print 'Exists', importDir
        else:
            if trace: print 'Adding', importDir
            sys.path.append(importDir)
            addList.append(importDir)
    if addList:
        addString = string.join(addList, os.pathsep) + os.pathsep
        os.environ['PYTHONPATH'] = addString + ospythonpath
        if trace: print 'PYTHONPATH updated:\n', os.environ['PYTHONPATH']
    else:
        if trace: print 'PYTHONPATH unchanged'


def configSystemPath(pythonDir, trace=1):
    """ 
    Add python executable dir to system search path if needed
    """
    try:
        ospath = os.environ['PATH']
    except:
        ospath = '' # okay if not set  
    if trace: print 'PATH start', ospath
    if (string.find(ospath, pythonDir) == -1 and                # not found?
        string.find(ospath, string.upper(pythonDir)) == -1):    # case diff?
        os.environ['PATH'] = ospath + os.pathsep + pythonDir
        if trace: print 'PATH updated:', os.environ['PATH']
    else:
        if trace: print 'PATH unchanged'


def runCommandLine(pypath, exdir, command, isOnWindows=0, trace=1):
    """
    Run python command as an independent program/process on 
    this platform, using pypath as the Python executable,
    and exdir as the installed examples root directory.
    Need full path to python on windows, but not on unix.
    On windows, a os.system('start ' + command) is similar,
    except that .py files pop up a dos console box for i/o.
    Could use launchmodes.py too but pypath is already known. 
    """
    command = exdir + os.sep + command          # rooted in examples tree
    os.environ['PP2E_PYTHON_FILE'] = pypath     # export directories for
    os.environ['PP2E_EXAMPLE_DIR'] = exdir      # use in spawned programs

    if trace: print 'Spawning:', command
    if isOnWindows:
        os.spawnv(os.P_DETACH, pypath, ('python', command))
    else:
        cmdargs = [pypath] + string.split(command)
        if os.fork(  ) == 0:
            os.execv(pypath, cmdargs)           # run prog in child process


def launchBookExamples(commandsToStart, trace=1):
    """
    Toplevel entry point: find python exe and 
    examples dir, config env, spawn programs
    """
    isOnWindows  = (sys.platform[:3] == 'win')
    pythonFile   = (isOnWindows and 'python.exe') or 'python'
    examplesFile = 'README-PP2E.txt'
    if trace: 
        print os.getcwd(  ), os.curdir, os.sep, os.pathsep
        print 'starting on %s...' % sys.platform

    # find python executable: check system path, then guess
    pypath = which(pythonFile) or guessLocation(pythonFile, isOnWindows) 
    assert pypath
    pydir, pyfile = os.path.split(pypath)               # up 1 from file
    if trace:
        print 'Using this Python executable:', pypath
        raw_input('Press <enter> key')
 
    # find examples root dir: check cwd and others
    expath = guessLocation(examplesFile, isOnWindows)
    assert expath
    updir  = string.split(expath, os.sep)[:-2]          # up 2 from file
    exdir  = string.join(updir,   os.sep)               # to PP2E pkg parent
    if trace:
        print 'Using this examples root directory:', exdir
        raw_input('Press <enter> key')
 
    # export python and system paths if needed
    configSystemPath(pydir)
    configPythonPath(exdir)
    if trace:
        print 'Environment configured'
        raw_input('Press <enter> key')

    # spawn programs
    for command in commandsToStart:
        runCommandLine(pypath, os.path.dirname(expath), command, isOnWindows)


if __name__ == '__main__':
    #
    # if no args, spawn all in the list of programs below
    # else rest of cmd line args give single cmd to be spawned
    #
    if len(sys.argv) == 1:
        commandsToStart = [
            'Gui/TextEditor/textEditor.pyw',        # either slash works
            'Lang/Calculator/calculator.py',        # os normalizes path
            'PyDemos.pyw',
           #'PyGadgets.py',
            'echoEnvironment.pyw'
        ]
    else:
        commandsToStart = [ string.join(sys.argv[1:], ' ') ]
    launchBookExamples(commandsToStart)
    import time
    if sys.platform[:3] == 'win': time.sleep(10)   # to read msgs if clicked

One way to understand the Launcher script is to trace the messages it prints along the way. When run by itself without a PYTHONPATH setting, the script finds a suitable Python and the examples root directory (by hunting for its README file), uses those results to configure PATH and PYTHONPATH settings if needed, and spawns a precoded list of program examples. To illustrate, here is a launch on Windows with an empty PYTHONPATH:

C:\temp\examples>set PYTHONPATH=

C:\temp\examples>python Launcher.py
C:\temp\examples . \ ;
starting on win32...
Looking for python.exe on ['C:\\WINDOWS', 'C:\\WINDOWS', 
'C:\\WINDOWS\\COMMAND', 'C:\\STUFF\\BIN.MKS', 'C:\\PROGRAM FILES\\PYTHON']
Not at C:\WINDOWS\python.exe
Not at C:\WINDOWS\python.exe
Not at C:\WINDOWS\COMMAND\python.exe
Not at C:\STUFF\BIN.MKS\python.exe
Found C:\PROGRAM FILES\PYTHON\python.exe
Using this Python executable: C:\PROGRAM FILES\PYTHON\python.exe
Press <enter> key
Using this examples root directory: C:\temp\examples
Press <enter> key
PATH start C:\WINDOWS;C:\WINDOWS;C:\WINDOWS\COMMAND;C:\STUFF\BIN.MKS;
C:\PROGRAM FILES\PYTHON
PATH unchanged
PYTHONPATH start:

Adding C:\temp\examples\Part3
Adding C:\temp\examples\Part2
Adding C:\temp\examples\Part2\Gui
Adding C:\temp\examples
PYTHONPATH updated:
C:\temp\examples\Part3;C:\temp\examples\Part2;C:\temp\examples\Part2\Gui;
C:\temp\examples;
Environment configured
Press <enter> key
Spawning: C:\temp\examples\Part2/Gui/TextEditor/textEditor.pyw
Spawning: C:\temp\examples\Part2/Lang/Calculator/calculator.py
Spawning: C:\temp\examples\PyDemos.pyw
Spawning: C:\temp\examples\echoEnvironment.pyw

Four programs are spawned with PATH and PYTHONPATH preconfigured according to the location of your Python interpreter program, the location of your examples distribution tree, and the list of required PYTHONPATH entries in script variable PP2EpackageRoots.

The PYTHONPATH directories that are added by preconfiguration steps may be different when you run this script, because the PP2EpackageRoots variable may have an arbitrarily different setting by the time this book's CD is burned. In fact, to make this example more interesting, the outputs listed were generated at a time when the book's PYTHONPATH requirements were much more complex than they are now:

PP2EpackageRoots = [
 '%sPart3' % os.sep, # python module search
path 
 '%sPart2' % os.sep, # required
by book demos 
 '%sPart2%sGui' %
((os.sep,)*2), 
 ''] # '' adds
examplesDir root

Since then, the tree has been reorganized so that only one directory needs to be added to the module search path -- the one containing the PP2E root directory. That makes it easier to configure (only one entry is added to PYTHONPATH now), but the code still supports a list of entries for generality. Like most developers, I can't resist playing with the directories.

When used by the PyDemos launcher script, Launcher does not pause for key presses along the way (the trace argument is passed in false). Here is the output generated when using the module to launch PyDemos with PYTHONPATH already set to include all the required directories; the script both avoids adding settings redundantly, and retains any exiting settings already in your environment:

C:\PP2ndEd\examples>python Launch_PyDemos.pyw
Looking for python.exe on ['C:\\WINDOWS', 'C:\\WINDOWS', 
'C:\\WINDOWS\\COMMAND', 'C:\\STUFF\\BIN.MKS', 'C:\\PROGRAM FILES\\PYTHON']
Not at C:\WINDOWS\python.exe
Not at C:\WINDOWS\python.exe
Not at C:\WINDOWS\COMMAND\python.exe
Not at C:\STUFF\BIN.MKS\python.exe
Found C:\PROGRAM FILES\PYTHON\python.exe
PATH start C:\WINDOWS;C:\WINDOWS;C:\WINDOWS\COMMAND;C:\STUFF\BIN.MKS;
C:\PROGRAM FILES\PYTHON
PATH unchanged
PYTHONPATH start:
C:\PP2ndEd\examples\Part3;C:\PP2ndEd\examples\Part2;C:\PP2ndEd\examples\
Part2\Gui;C:\PP2ndEd\examples
Exists C:\PP2ndEd\examples\Part3
Exists C:\PP2ndEd\examples\Part2
Exists C:\PP2ndEd\examples\Part2\Gui
Exists C:\PP2ndEd\examples
PYTHONPATH unchanged
Spawning: C:\PP2ndEd\examples\PyDemos.pyw

And finally, here is the trace output of a launch on my Linux system; because Launcher is written with portable Python code and library calls, environment configuration and directory searches work just as well there:

[mark@toy ~/PP2ndEd/examples]$ unsetenv PYTHONPATH
[mark@toy ~/PP2ndEd/examples]$ python Launcher.py
/home/mark/PP2ndEd/examples . / :
starting on linux2...
Looking for python on ['/home/mark/bin', '.', '/usr/bin', '/usr/bin', '/usr/local/
bin', '/usr/X11R6/bin', '/bin', '/usr/X11R6/bin', '/home/mark/
bin', '/usr/X11R6/bin', '/home/mark/bin', '/usr/X11R6/bin']
Not at /home/mark/bin/python
Not at ./python
Found /usr/bin/python
Using this Python executable: /usr/bin/python
Press <enter> key
Using this examples root directory: /home/mark/PP2ndEd/examples
Press <enter> key
PATH start /home/mark/bin:.:/usr/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/bin:/
usr
/X11R6/bin:/home/mark/bin:/usr/X11R6/bin:/home/mark/bin:/usr/X11R6/bin
PATH unchanged
PYTHONPATH start:

Adding /home/mark/PP2ndEd/examples/Part3
Adding /home/mark/PP2ndEd/examples/Part2
Adding /home/mark/PP2ndEd/examples/Part2/Gui
Adding /home/mark/PP2ndEd/examples
PYTHONPATH updated:
/home/mark/PP2ndEd/examples/Part3:/home/mark/PP2ndEd/examples/Part2:/home/
mark/PP2ndEd/examples/Part2/Gui:/home/mark/PP2ndEd/examples:
Environment configured
Press <enter> key
Spawning: /home/mark/PP2ndEd/examples/Part2/Gui/TextEditor/textEditor.py
Spawning: /home/mark/PP2ndEd/examples/Part2/Lang/Calculator/calculator.py
Spawning: /home/mark/PP2ndEd/examples/PyDemos.pyw
Spawning: /home/mark/PP2ndEd/examples/echoEnvironment.pyw

In all of these launches, the Python interpreter was found on the system search-path, so no real searches were performed (the "Not at" lines near the top represent the module's which function). In a moment, we'll also use the Launcher's which and guessLocation functions to look for web browsers in a way that kicks off searches in standard install directory trees. Later in the book, we'll use this module in other ways -- for instance, to search for demo programs and source code files somewhere on the machine, with calls of this form:

C:\temp>python
>>> from PP2E.Launcher import guessLocation
>>> guessLocation('hanoi.py')
Searching for hanoi.py in C:\Program Files\Python
Searching for hanoi.py in C:\temp\examples
Searching for hanoi.py in C:\Program Files
Searching for hanoi.py in C:\
'C:\\PP2ndEd\\cdrom\\Python1.5.2\\SourceDistribution\\Unpacked\\Python-1.5.2
\\Demo\\tkinter\\guido\\hanoi.py'

>>> from PP2E.Launcher import findFirst
>>> findFirst('.', 'PyMailGui.py')
'.\\examples\\Internet\\Email\\PyMailGui.py'

Such searches aren't necessary if you can rely on an environment variable to give at least part of the path to a file; for instance, paths scripts within the PP2E examples tree can be named by joining the PP2EHOME shell variable, with the rest of the script's path (assuming the rest of the script's path won't change, and we can rely on that shell variable being set everywhere).

Some scripts may also be able to compose relative paths to other scripts using the sys.path[0] home-directory indicator added for imports (see Section 2.7). But in cases where a file can appear at arbitrary places, searches like those shown previously are sometimes the best scripts can do. The earlier hanoi.py program file, for example, can be anywhere on the underlying machine (if present at all); searching is a more user-friendly final alternative than simply giving up.

Finding Programs on Windows

Per a tip from a Python Windows guru, it may also be possible to determine the location of the installed Python interpreter on Windows with platform-specific code like this:

import _winreg
try:
   keyname = "SOFTWARE\\Microsoft\\Windows\\"
             +
             "CurrentVersion\\AppPaths\\python.exe"
   pyexe   =_winreg.QueryValue(
            _winreg.HKEY_LOCAL_MACHINE, keyname)
except _winreg.error:
    # not found

This code uses the _winreg module (new as of Python 1.6) to find Python if it has been installed correctly. The same sort of code will work for most other well-installed applications (e.g., web browsers), but not for some other kinds of files (e.g., Python scripts). It's also too Windows-specific to cover better in this text; see Windows resources for more details.

4.6.3 Launching Web Browsers Portably

Web browsers can do amazing things these days. They can serve as document viewers, remote program launchers, database interfaces, media players, and more. Being able to open a browser on a local or remote page file from within a script opens up all kinds of interesting user-interface possibilities. For instance, a Python system might automatically display its HTML-coded documentation when needed, by launching the local web browser on the appropriate page file.^[5] Because most browsers know how to present pictures, audio files, and movie clips, opening a browser on such a file is also a simple way for scripts to deal with multimedia.

The last script listed in this chapter is less ambitious than Launcher.py, but equally reusable: LaunchBrowser.py attempts to provide a portable interface for starting a web browser. Because techniques for launching browsers vary per platform, this script provides an interface that aims to hide the differences from callers. Once launched, the browser runs as an independent program, and may be opened to view either a local file or a remote page on the Web.

Here's how it works. Because most web browsers can be started with shell command lines, this script simply builds and launches one as appropriate. For instance, to run a Netscape browser on Linux, a shell command of the form netscape url is run, where url begins with "file://" for local files, and "http://" for live remote-page accesses (this is per URL conventions we'll meet in more detail later, in Chapter 12). On Windows, a shell command like start url achieves the same goal. Here are some platform-specific highlights:

Windows platforms

On Windows, the script either opens browsers with DOS start commands, or searches for and runs browsers with the os.spawnv call. On this platform, browsers can usually be opened with simple start commands (e.g., os.system("start xxx.html")). Unfortunately, start relies on the underlying filename associations for web page files on your machine, picks a browser for you per those associations, and has a command-line length limit that this script might exceed for long local file paths or remote page addresses.

Because of that, this script falls back on running an explicitly named browser with os.spawnv, if requested or required. To do so, though, it must find the full path to a browser executable. Since it can't assume that users will add it to the PATH system search path (or this script's source code), the script searches for a suitable browser with Launcher module tools in both directories on PATH and in common places where executables are installed on Windows.

Unix-like platforms

On other platforms, the script relies on os.system and the system PATH setting on the underlying machine. It simply runs a command line naming the first browser on a candidates list that it can find on your PATH setting. Because it's much more likely that browsers are in standard search directories on platforms like Unix and Linux (e.g., /usr/bin), the script doesn't look for a browser elsewhere on the machine. Notice the & at the end of the browser command-line run; without it, os.system calls block on Unix-like platforms.

All of this is easily customized (this is Python code, after all), and you may need to add additional logic for other platforms. But on all of my machines, the script makes reasonable assumptions that allow me to largely forget most of the platform-specific bits previously discussed; I just call the same launchBrowser function everywhere. For more details, let's look at Example 4-15.

Example 4-15. PP2E\LaunchBrowser.py

#!/bin/env python
#################################################################
# Launch a web browser to view a web page, portably.  If run 
# in '-live' mode, assumes you have a Internet feed and opens
# a page at a remote site.  Otherwise, assumes the page is a 
# full file path name on your machine, and opens the page file
# locally.  On Unix/Linux, finds first browser on your $PATH.
# On Windows, tries DOS "start" command first, or searches for
# the location of a browser on your machine for os.spawnv by 
# checking PATH and common Windows executable directories. You 
# may need to tweak browser executable name/dirs if this fails.
# This has only been tested in Win98 and Linux, so you may need 
# to add more code for other machines (mac: ic.launcurl(url)?).
#################################################################

import os, sys
from Launcher import which, guessLocation     # file search utilities
useWinStart = 1                               # 0=ignore name associations
onWindows   = sys.platform[:3] == 'win'
helptext    = "Usage: LaunchBrowser.py [ -file path | -live path site ]"
#browser    = r'c:\"Program Files"\Netscape\Communicator\Program\netscape.exe'

# defaults
Mode = '-file'
Page = os.getcwd(  ) + '/Internet/Cgi-Web/PyInternetDemos.html'
Site = 'starship.python.net/~lutz'

def launchUnixBrowser(url, verbose=1):            # add your platform if unique
    tries = ['netscape', 'mosaic', 'lynx']        # order your preferences here
    for program in tries:
        if which(program): break                  # find one that is on $path
    else:
        assert 0, 'Sorry - no browser found'
    if verbose: print 'Running', program
    os.system('%s %s &' % (program, url))         # or fork+exec; assumes $path

def launchWindowsBrowser(url, verbose=1):
    if useWinStart and len(url) <= 400:           # on windows: start or spawnv
        try:                                      # spawnv works if cmd too long
            if verbose: print 'Starting'       
            os.system('start ' + url)             # try name associations first
            return                                # fails if cmdline too long
        except: pass
    browser = None                                # search for a browser exe
    tries   = ['IEXPLORE.EXE', 'netscape.exe']    # try explorer, then netscape
    for program in tries:
        browser = which(program) or guessLocation(program, 1)
        if browser: break
    assert browser != None, 'Sorry - no browser found'
    if verbose: print 'Spawning', browser
    os.spawnv(os.P_DETACH, browser, (browser, url))

def launchBrowser(Mode='-file', Page=Page, Site=None, verbose=1):
    if Mode == '-live':
        url = 'http://%s/%s' % (Site, Page)       # open page at remote site
    else:
        url = 'file://%s' % Page                  # open page on this machine
    if verbose: print 'Opening', url
    if onWindows:
        launchWindowsBrowser(url, verbose)        # use windows start, spawnv
    else:
        launchUnixBrowser(url, verbose)           # assume $path on unix, linux

if __name__ == '__main__':
    # get command-line args
    argc = len(sys.argv)
    if argc > 1:  Mode = sys.argv[1]
    if argc > 2:  Page = sys.argv[2]
    if argc > 3:  Site = sys.argv[3]
    if Mode not in ['-live', '-file']:
        print helptext
        sys.exit(1)
    else:
        launchBrowser(Mode, Page, Site)

4.6.3.1 Launching browsers with command lines

This module is designed to be both run and imported. When run by itself on my Windows machine, Internet Explorer starts up. The requested page file is always displayed in a new browser window when os.spawnv is applied, but in the currently open browser window (if any) when running a start command:

C:\...\PP2E>python LaunchBrowser.py
Opening file://C:\PP2ndEd\examples\PP2E/Internet/Cgi-Web/PyInternetDemos.html
Starting

The seemingly odd mix of forward and backward slashes in the URL here works fine within the browser; it pops up the window shown in Figure 4-2.

Figure 4-2. Launching a Windows browser on a local file

figs/ppy2_0402.gif

By default, a start command is spawned; to see the browser search procedure in action on Windows, set the script's useWinStart variable to 0. The script will search for a browser on your PATH settings, and then in common Windows install directories hardcoded in Launcher.py :

C:\...\PP2E>python LaunchBrowser.py 
                       -file C:\Stuff\Website\public_html\about-pp.html
Opening file://C:\Stuff\Website\public_html\about-pp.html
Looking for IEXPLORE.EXE on ['C:\\WINDOWS', 'C:\\WINDOWS', 
'C:\\WINDOWS\\COMMAND', 'C:\\STUFF\\BIN.MKS', 'C:\\PROGRAM FILES\\PYTHON']
Not at C:\WINDOWS\IEXPLORE.EXE
Not at C:\WINDOWS\IEXPLORE.EXE
Not at C:\WINDOWS\COMMAND\IEXPLORE.EXE
Not at C:\STUFF\BIN.MKS\IEXPLORE.EXE
Not at C:\PROGRAM FILES\PYTHON\IEXPLORE.EXE
IEXPLORE.EXE not on system path
Searching for IEXPLORE.EXE in C:\Program Files\Python
Searching for IEXPLORE.EXE in C:\PP2ndEd\examples\PP2E
Searching for IEXPLORE.EXE in C:\Program Files
Spawning C:\Program Files\Internet Explorer\IEXPLORE.EXE

If you study these trace message you'll notice that the browser wasn't on the system search path, but was eventually located in a local C:\Program Files subdirectory -- this is just the Launcher module's which and guessLocation functions at work. As coded, the script searches for Internet Explorer first; if that's not to your liking, try changing the script's tries list to make Netscape first:

C:\...\PP2E>python LaunchBrowser.py
Opening file://C:\PP2ndEd\examples\PP2E/Internet/Cgi-Web/PyInternetDemos.html
Looking for netscape.exe on ['C:\\WINDOWS', 'C:\\WINDOWS', 
'C:\\WINDOWS\\COMMAND', 'C:\\STUFF\\BIN.MKS', 'C:\\PROGRAM FILES\\PYTHON']
Not at C:\WINDOWS\netscape.exe
Not at C:\WINDOWS\netscape.exe
Not at C:\WINDOWS\COMMAND\netscape.exe
Not at C:\STUFF\BIN.MKS\netscape.exe
Not at C:\PROGRAM FILES\PYTHON\netscape.exe
netscape.exe not on system path
Searching for netscape.exe in C:\Program Files\Python
Searching for netscape.exe in C:\PP2ndEd\examples\PP2E
Searching for netscape.exe in C:\Program Files
Spawning C:\Program Files\Netscape\Communicator\Program\netscape.exe

Here, the script eventually found Netscape in a different install directory on the local machine. Besides automatically finding a user's browser for them, this script also aims to be portable. When running this file unchanged on Linux, the local Netscape browser starts, if it lives on your PATH; otherwise, others are tried:

[mark@toy ~/PP2ndEd/examples/PP2E]$ python LaunchBrowser.py
Opening file:///home/mark/PP2ndEd/examples/PP2E/Internet/Cgi-
Web/PyInternetDemos.html
Looking for netscape on ['/home/mark/bin', '.', '/usr/bin', '/usr/bin',
'/usr/local/bin', '/usr/X11R6/bin', '/bin', '/usr/X11R6/bin', '/home/mark/
bin', '/usr/X11R6/bin', '/home/mark/bin', '/usr/X11R6/bin']
Not at /home/mark/bin/netscape
Not at ./netscape
Found /usr/bin/netscape
Running netscape
[mark@toy ~/PP2ndEd/examples/PP2E]$

I have Netscape installed, so running the script this way on my machine generates the window shown in Figure 4-3, seen under the KDE window manager.

Figure 4-3. Launching a browser on Linux

figs/ppy2_0403.gif

If you have an Internet connection, you can open pages at remote servers too -- the next command opens the root page at my site on the starship.python.netserver, located somewhere on the East Coast the last time I checked:

C:\...\PP2E>python LaunchBrowser.py -live ~lutz starship.python.net
Opening http://starship.python.net/~lutz
Starting

In Chapter 8, we'll see that this script is also run to start Internet examples in the top-level demo launcher system: the PyDemos script presented in that chapter portably opens local or remote web page files with this button-press callback:

[File mode]
    pagepath = os.getcwd(  ) + '/Internet/Cgi-Web'
    demoButton('PyErrata',  
               'Internet-based errata report system',
               'LaunchBrowser.py -file %s/PyErrata/pyerrata.html' % pagepath)

[Live mode]
    site = 'starship.python.net/~lutz'
    demoButton('PyErrata',  
               'Internet-based errata report system',
               'LaunchBrowser.py -live PyErrata/pyerrata.html ' + site)

4.6.3.2 Launching browsers with function calls

Other programs can spawn LaunchBrowser.py command lines like those shown previously with tools like os.system, as usual; but since the script's core logic is coded in a function, it can just as easily be imported and called:

>>> from PP2E.LaunchBrowser import launchBrowser
>>> launchBrowser(Page=r'C:\Stuff\Website\Public_html\about-pp.html')
Opening file://C:\Stuff\Website\Public_html\about-pp.html
Starting
>>>

When called like this, launchBrowser isn't much different from spawning a start command on DOS or a netscape command on Linux, but the Python launchBrowser function is designed to be a portable interface for browser startup across platforms. Python scripts can use this interface to pop up local HTML documents in web browsers; on machines with live Internet links, this call even lets scripts open browsers on remote pages on the Web:

>>> launchBrowser(Mode='-live', Page='index.html', Site='www.python.org')
Opening http://www.python.org/index.html
Starting

>>> launchBrowser(Mode='-live', Page='~lutz/PyInternetDemos.html',
...                             Site='starship.python.net')
Opening http://starship.python.net/~lutz/PyInternetDemos.html
Starting

On my computer, the first call here opens a new Internet Explorer GUI window if needed, dials out through my modem, and fetches the Python home page from http://www.python.org on both Windows and Linux -- not bad for a single function call. The second call does the same, but with a web demos page we'll explore later.

4.6.3.3 A Python "multimedia extravaganza"

I mentioned earlier that browsers are a cheap way to present multimedia. Alas, this sort of thing is best viewed live, so the best I can do is show startup commands here. The next command line and function call, for example, display two GIF images in Internet Explorer on my machine (be sure to use full local pathnames). The result of the first of these is captured in Figure 4-4.

C:\...\PP2E>python LaunchBrowser.py 
                         -file C:\PP2ndEd\examples\PP2E\Gui\gifs\hills.gif
Opening file://C:\PP2ndEd\examples\PP2E\Gui\gifs\hills.gif
Starting

C:\temp>python
>>> from LaunchBrowser import launchBrowser
>>> launchBrowser(Page=r'C:\PP2ndEd\examples\PP2E\Gui\gifs\mp_lumberjack.gif')
Opening file://C:\PP2ndEd\examples\PP2E\Gui\gifs\mp_lumberjack.gif
Starting

Figure 4-4. Launching a browser on an image file

figs/ppy2_0404.gif

The next command line and call open the sousa.au audio file on my machine too; the second of these downloads the file from http://www.python.org first. If all goes as planned, they'll make the Monty Python theme song play on your computer too:

C:\PP2ndEd\examples>python LaunchBrowser.py
                         -file C:\PP2ndEd\examples\PP2E\Internet\Ftp\sousa.au
Opening file://C:\PP2ndEd\examples\PP2E\Internet\Ftp\sousa.au
Starting

>>> launchBrowser(Mode='-live',
...               Site='www.python.org',
...               Page='ftp/python/misc/sousa.au',
...               verbose=0)
>>>

Of course, you could just pass these filenames to a spawned start command on Windows, or run the appropriate handler program directly with something like os.system. But opening these files in a browser is a more portable approach -- you don't need to keep track of a set of file-handler programs per platform. Provided your scripts use a portable browser launcher like LaunchBrowser, you don't even need to keep track of a browser per platform.

In closing, I want to point out that LaunchBrowser reflects browsers that I tend to use. For instance, it tries to find Internet Explorer before Netscape on Windows, and prefers Netscape over Mosaic and Lynx on Linux, but you should feel free to change these choices in your copy of the script. In fact, both LaunchBrowser and Launcher make a few heuristic guesses when searching for files that may not make sense on every computer. As always, hack on; this is Python, after all.

Reptilian Minds Think Alike

A postscript: roughly one year after I wrote the LaunchBrowser script, Python release 2.0 sprouted a new standard library module that serves a similar purpose: webbrowser.open(url) also attempts to provide a portable interface for launching browsers from scripts. This module is more complex, but likely to support more options than the LaunchBrowser script presented here (e.g., Macintosh browsers are directly supported with the Mac ic.launcurl(url) call -- a call I'd add to LaunchBrowser too, if I had a Mac lying around the office). See the library manual in releases 2.0 and later for details.

Just before publication, I stumbled onto another script called FixTk.py in the lib-tk subdirectory of the Python source library; at least in Python 1.5.2, this script tries to locate the Tcl/Tk 8.0 DLLs on Windows by checking common install directories, in order to allow Python/Tkinter programs to work without Tcl/Tk PATH settings. It doesn't recursively search directory trees like the Launcher module presented in this chapter, and may be defunct by the time you read this (Tk is copied into Python's own install directory as of Python 2.0), but it is similar in spirit to some of the tools in this chapter's last section.

[1] I hope this doesn't happen -- such a change would be a major break from backward compatibility, and could impact Python systems all over the world. On the other hand, it's just a possibility for a future mutation of Python. I'm told that publishers of technical books love language changes, and this isn't a text on politics.

[2] See also the built-in module gzip.py in the Python standard library; it provides tools for reading and writing gzip files, usually named with a .gz filename extension. It can be used to unpack gzipped files, and serves as an all-Python equivalent of the standard gzip and gunzip command-line utility programs. This built-in module uses another called zlib that implements gzip-compatible data compressions. In Python 2.0, see also the new zipfile module for handling ZIP format archives (different from gzip).

[3] It happens. In fact, most people who spend any substantial amount of time in cyberspace probably could tell a horror story or two. Mine goes like this: I had an account with an ISP that went completely offline for a few weeks in response to a security breach by an ex-employee. Worse, personal email was not only disabled, but queued up messages were permanently lost. If your livelihood depends on email and the Web as much as mine does, you'll appreciate the havoc such an outage can wreak.

[4] You gurus and wizards out there will just have to take my word for it. One of the very first things you learn from flying around the world teaching Python to beginners is just how much knowledge developers take for granted. In the book Learning Python, for example, my co-author and I directed readers to do things like "open a file in your favorite text editor" and "start up a DOS command console." We had no shortage of email from beginners wondering what in the world we meant.

[5] For example, the PyDemosdemo bar GUI we'll meet in Chapter 8, has buttons that automatically open a browser on web pages related to this book when pressed -- the publisher's site, the Python home page, my update files, and so on.

CONTENTS